Transcriptional bursts: a unified model of machines and mechanisms 
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Transcription is the process whereby RNA molecules are polymerized by molecular machines, 
called RNA polymerase (RNAP), using the corresponding DNA as the template. Recent in-vivo 
experiments with single cells have established that transcription takes place in "bursts" or "pulses" . 
In this letter we present a model that captures not only the mechano-chemistry of individual RNAPs 
and their steric interactions but also the switching of the gene between the ON and OFF states. 
This model accounts for the statistical properties of the transcriptional bursts. It also shows how the 
quantitative features of the distributions of these bursts can be tuned by controlling the appropriate 
steps of operation of the RNAP machines. 

PACS numbers: 87.16.dj; 87.18. Tt 



Genetic messages are encoded chemically in DNA. 
During gene expression this message is first transcribed 
into mRNA and then, from it, translated into proteins by 
well coordinated operation of intracellular machineries 
The two machines, which play key roles in transcrip- 
tion and translation are the RNA polymerase (RNAP) 
and the ribosome [S 0| , respectively. Each of these 
machines is like a mobile workshop that synthesizes a 
bio-polymer according to a template which also serves 
as the track for the movement of the workshop. Be- 
cause of the probabilistic nature of the steps of gene ex- 
pression, the number of mRNA and protein molecules 
corresponding to a single gene fluctuate randomly (see 
ref . IE H, H, IE P for reviews). It has been observed experi- 
mentally |ld. Till 113. Il3l] that relatively long periods T // 
of transcriptional inactivity are interspersed with brief 
periods T on of transcriptional "bursts". Several statis- 
tical properties of these random "bursts" (or, "pulses") 
have been used to characterize the temporal pattern in 
transcriptional events [IE Ell, E2] • 

Qualitatively similar bursts of transcriptional activities 
have been observed in both prokaryotes and eukaryotes. 
Some possible mechanisms of transcriptional burst have 
been suggested. Transcription, i.e., the process of syn- 
thesis of RNA from the corresponding DNA template, 
can be broadly divided into three stages, namely, ini- 
tiation, elongation and termination. When the gene is 
switched "ON", initiation of transcription by RNAPs 
can take place till the gene switches back to the "OFF" 
state [T3 | . Unbinding and binding of transcription repres- 
sor molecules can give rise to such switching "ON" and 
"OFF" of a bacterial gene. In eukaryotic cells, chromatin 
remodelling enzymes can act as activators of transcrip- 
tion. Even if the gene does not switch OFF, burst-like 
transcriptional activities are possible if several RNAPs 
queue up behind a stalled RNAP and then, suddenly, 
the stalled RNAP gets reactivated [H, Ql|. The latter 
becomes very pronounced Ht|| when pausing is caused by 
backtracking of the RNAP [ll. 

To our knowledge, the switching of the gene be- 
tween the active (ON) and inactive (OFF) states is a 
common feature of almost all the models of transcrip- 
tional noise [H, H3|- But, these models capture the en- 
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FIG. 1: (Color online) A pictoral depiction of the model. 
The three dashed squares represent three TECs. The solid 
lines connecting filled circles represent the single strands of 
DNA while the string of open circles denotes the elongating 
RNA molecule. The dashed lines connecting the circles denote 
the unbroken non-covalent bonds between the complementary 
subunits on the DNA and RNA strands. Each of the grey 
ovals represents the catalytic site on the corresponding RNAP. 
The green and red squares indicate the ON and OFF states 
of the gene. The rates of the transitions between the ON and 
the OFF states as well as the rate of transcription initiation 
in the ON state of the gene are also shown explicitly. 



tire processes of RNA production by a single effective 
rate constant. In contrast, models developed to un- 
derstand the operational mechanisms of RNAP motors 
[U [U, H| US HI HE HE HI HI explicitly describe 
the different stages of transcription, namely, initiation, 
elongation and termination, but (except for ref.[28|) do 
not address the question of temporal fluctuations in tran- 
scription. The main aim of this letter is to combine the 
key features of these two types of models within a single 
unified theoretical framework. 

More specifically, we extend our recent model of RNAP 
traffic [28j by allowing the gene to switch between the 
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FIG. 2: (Color online) Allowed mechano-chemical transi- 
tions of individual RNAPs during the elongation stage in our 
model. The indices j — l,j,j + 1 denote an arbitrary se- 
quence of three nucleotides on the template DNA. The encir- 
cled symbols 1 and 2 denote the two possible chemical states; 
no pyrophosphate (PPi) is bound to the RNAP in the state 
1 whereas PPi-bound state is labelled by the index 2. The 
directions of the arrows and the associated symbols indicate 
the possible transitions and the corresponding rate constants, 
respectively. Elongation of the nascent RNA transcript is 
accompanied by forward movement of the RNAP whereas 
backward movements of the RNAP correspond to depoly- 
merization of the RNA. The full model, shown in (a), allows 
mechano-chemical transitions which branch off the dominant 
pathway of an individual RNAP shown in (b). 
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FIG. 3: (Color online) A typical time series of the transcrip- 
tional events in the model; each vertical bar corresponds to 
the completion of polymerization of a RNA molecule. 



"ON" and "OFF" states. In other words, this extended 
model explicitly describes the following processes: (i) 
switching "ON" and "OFF" of the gene, (ii) initia- 
tion, elongation and termination of transcription, (iii) 
mechano-chemical cycles of the individual RNAP motors 
in the elongation stage, and (iv) congestion of traffic of 
RNAPs caused by their steric interaction. Consequently, 
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FIG. 4: (Color online) The distribution of the sizes of the 
transcriptional bursts in our model plotted using At — 0.5 
min (At — 2.5 min in the inset). The continuous line (red) 
is obtained from the theoretically predicted form (|6]). The 
data points, plotted as bars, were obtained from computer 
simulations of the model. 
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FIG. 5: (Color online) Distribution of the durations of tran- 
scriptional bursts in our model plotted using At — 0.5 min 
(At = 2.5 min in the inset). The continuous lines (red) are 
obtained from the theoretically predicted form ([7J| . The data 
points, plotted as bars, were obtained from computer simula- 
tions of the model. The black dots in the inset represent the 
experimental data reported by Chubb et al. in ref.[ll|. 



this model can predict the contributions of the processes 
(i)-(iv) on transcriptional noise; estimation of the contri- 
butions made by the processes (ii)-(iv) was beyond the 
scope of all the earlier models of transcriptional noise. 

Carrying out computer simulations of this model we 
obtain the time series of the transcriptional events. We 
sort the transcriptional events of each time series ob- 
tained from our simulations into "bursts" by using well- 
defined criteria (which we describe below). We compare 
various statistical properties of these theoretically pre- 
dicted transcriptional bursts with the corresponding ex- 
perimental results. We then suggest an alternative statis- 
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FIG. 6: (Color online) Distribution of the intervals between 
successive bursts of transcriptional activities in our model 
plotted using At = 0.5 min (At = 2.5 min in the inset). 
The continuous lines (red) are obtained from the theoreti- 
cally predicted form J5). The data points, plotted as bars, 
were obtained from computer simulations of the model. 
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FIG. 7: Distribution of the time headways between the suc- 
cessive RNAPs in our model (triangles). The filled circles 
show the time heaway distribution in the hypothetical sce- 
nario (which was assumed in ref.[2^]) where the gene remains 
in the ON state throughout the period of computation. The 
inset shows the long tail of the time headway distribution 
which corresponds to the time gaps between the successive 
bursts. 

tical analysis of the transcriptional noise in terms of some 
new distributions which are motivated by superficial sim- 
ilarities between RNAP traffic and vehicular traffic. We 
also derive an approximate analytical expression for this 
statistical analysis in a simplified special case and demon- 
strate its use by comparing with the corresponding data 
obtained from computer simulations. 

Before presenting our quantitative model, we summa- 
rize a few essential steps in transcription. The RNAP 
locally unzips the two DNA strands creating a "bubble" 
whereby a single stranded DNA (ssDNA) template is ex- 
posed to it. Together with the DNA bubble and the grow- 



FIG. 8: Comparison between the approximate analytical ex- 
pression (|10[) for the TH distribution (denoted by the line) 
and the corresponding simulation data (denoted by the dis- 
crete data points) in the special case where r = 1; periodic 
boundary conditions are imposed and the gene remains "ON" 
during the entire duration of observation. For the two sets of 
simulation data labelled as "simulation 1" and "simulation 2" 
the mechano-chemical transitions shown in figsf^b) and^a), 
respectively, have been used. 

ing RNA transcript, the RNAP forms a macromolecular 
complex called the "transcription elongation complex" 
(TEC). The size of a single TEC is such that each incor- 
porates r successive nucleotides of the DNA template. 
During elongation, each mechano-chemical cycle of the 
RNAP consists of several steps. The major steps of this 
cycle involve the selection of the appropriate subunit for 
mRNA, as dictated by the DNA template and, then, its 
attachment to the growing mRNA transcript by a reac- 
tion that is catalyzed by the RNAP. Release of pyrophos- 
phate (PPi), one of the products of this reaction is the 
rate-limiting step in each cycle. Thus, in each cycle, an 
RNAP steps forward by one nucleotide. The elongation 
process ends when the TEC encounters the correspond- 
ing "termination sequence" and the nascent mRNA is 
released by the RNAP. 

Our model of transcription is shown schematically in 
figUl where the essential components of each of the TECs 
are shown explicitly. The green and red squares at the 
start regions of the gene indicate the "ON" and 'OFF" 
states of the gene, respectively. The rate constant (i.e., 
probability per unit time) of transition from the "OFF" 
state to the "ON" state is denoted by the symbol uj on 
whereas that of the reverse transition is denoted by uj q j /. 
Initiation and termination of transcription are captured 
by the same prescription which have been used in our 
earlier work reported in ref. 28]; the corresponding rate 
constants being u a and 0^3, respectively. 

In our model, the mechano-chemical cycle of individual 
RNAPs in the elongation stage and the nature of their 
steric interactions are identical to those used in ref. [2c| . 
For the sake of completeness, all the possible mechano- 
chemical transitions of an individual RNAP during the 
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elongation stage are shown in figj2ja). Since pyrophos- 
phate release is the rate limiting step, we assume that, at 
any given instant of time, a RNAP can exist in one of the 
two possible "chemical" states; no pyrophosphate (PPi) 
is bound to the RNAP in the state 1 whereas the PPi- 
bound state of the RNAP is labelled by the index 2. The 
rate of PPi release is denoted by u>i2 while the reverse 
reaction takes place at the rate u>2i- The rate constants 
,J 



UJ 



and u! 22 correspond to polymerization of RNA 



whereas the rate constants u>\ 2 . 



Ji! and W22 correspond 
to depolymerization of the RNA. 

For our numerical calculations, we have used the same 
set of rate constants which we used in ref . 28] ; these are 
as follows: 
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(1) 

where [NTP], [NMP] and [PPi] denote the concen- 
trations of nucleoside triphosphate (NTP), nucleoside 
monophosphate (NMP) and pyrophosphate (PPi), re- 
spectively. Moreover, for the figures in this letter, we 
have used to a — 5.0s _1 , u>p = 50s" 1 , w // = 0.01s -1 , 
w m = 0.001s -1 , and the concentrations [NTP] = 
1Q- 4 M, [PPi] = 10 -6 Af , [NMP] = 10 -6 M. 

Note that, in spite of all the possible transitions shown 
in fig^a), the dominant pathway is the one shown in 
figHIb), where io 21 is proportional to the concentra- 
tion of the available NTP subunits. All the quantita- 
tive predictions of our theory would remain equally valid 
if RNAP remains immobilzed while the template DNA 
passes through it in steps of one base pair, a scenario 
based on the concept of "transcriptional factory" in-vivo 
[29I ]. This alternative scenario would be mathematically 
related to the one used in this paper by just a coordinate 
transformation 30]- from the rest frame of the template 
DNA to that of the RNAP. 

Mere visual examination of the time series of the tran- 
scriptional events (see Fig [3] for a typical one) establishes 
the occurrence of random bursts of transcriptional activ- 
ities in our model [141] . In order to sort these events into 
separate bursts, let us use a resolution At. Members 
of the same "burst" are separated from the immediate 
preceeding and suceeding transcriptional events by time 
gaps smaller than At while the time gap between any 
pair of successive bursts is at least At (or, longer). Our 
choice of At = 2.5 min. is motivated by the correspond- 
ing choice in typical laboratory experiments. We have 



also analyzed the same data using At = 0.5 min to test 
whether the conclusions drawn from our sorting proce- 
dure are, indeed, robust. 

The number of transcriptional events in a burst is a 
measure of its size. The probability of the occurrence of 
a burst of size n is given by 



P(n) = P on pi P off , 



(2) 



where P on and P // are the probabilities of the gene 
switching ON and OFF, respectively, while pt r is the 
probability that a transcriptional event is completed by 
a RNAP. We can recast equation @ into the exponential 
form 



P(n) = P on P off exp(-n/b), 



(3) 



where (1/6) = —In ptr- Obviously, b is the average size 
of a transcriptional burst; the higher is the magnitude of 
Ptr the larger is the average size of the bursts. 

Our model goes beyond most of the earlier models of 
noise in transcription of a single gene because our model 
can predict the explicit dependence of P on , Poff and pt r 
on the rates of the steps of the mechano-chemical cycles of 
individual RNAPs as well as on their interactions. Sup- 
pose, to e ff is the effective rate constant associated with 
the process of forward movement of the RNAP by one 
site (i.e., one nucleotide). Obviously, considering only the 
dominant pathway shown in fig. (J^b) ) - = + -^j- 



and, hence, u) e ff 



f 



T 



An RNAP can attach to 



OJ12 + 

DNA strand only after the preceeding RNAP vacates the 
initial r sites on the lattice. The rate at which a RNAP 
moves by r sites is fc e // = ^iL, Since k e ff -C u) a , the 
rate limting step in the process of transcription will be the 
initiation which will be determined essentially by fc e //. 

Hence, P tr cx exp where (T on ) = 

Thus, 



Ptr oc exp 



-"off 
k eff 



Moreover, 

Pon 



UJ on + UJ ff 



and P ff 



LU on + W ff 



(4) 



(5) 



Finally, after normalization, the discrete distribution of 
the burst sizes is given by 



Pin) 



1 — exp 



J off 



exp 



J off 



k. 



(6) 



A typical distribution of the sizes of the bursts, ob- 
tained from computer simulations of our model, is plotted 
in FigU] using two different values of At. These data are 
in excellent agreement with the theoretically predicted 
distribution Q ; this exponential distribution is also con- 
sistent with the corresponding experimental observations 
[l(| [HI- Moreover, the data plotted in the inset of FigfJ] 
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also fit an exponential distribution thereby establishing 
that our conclusion is robust and independent of the ac- 
tual magnitude of At as long it remains within a reason- 
able range. 

The duration of a burst is measured by the time inter- 
val between the first and the last transcriptional events 
which are members of the same burst. It is straightfor- 
ward to see that the normalized distribution P(td U r) of 
the burst durations tdur is given by 



P{tdur) = ^on exp {-LOon tdur) 



(7) 



The theoretically predicted exponential distribution ([TJ 
is in excellent quantitative agreement with the corre- 
sponding numerical data obtained by direct computer 
simulations (see FigEJ • The experimental data reported 
by Chubb et al. [ll| are also plotted in the inset of Figj5] 
The nature of the distribution (namely, the exponential 
form) established by out theory and simulation is consis- 
tent with that observed in the experiments. The quan- 
titative difference between the results predicted by our 
model and those obtained from experiments arises from 
the fact that the rate constants for the system used in the 
experiments are not necessarily identical to those used in 
plotting the results of our theory and simulation. 

The time interval between a two successive bursts is 
the time gap between the last event of the earlier burst 
and the first event of the later burst. The normalized dis- 
tribution P{ti n t) of the intervals U n t between successive 
bursts is given by 



P{tint) = Woff eXp(-U> of f tint) 



(8) 



The quantitative agreement between this theoretical pre- 
diction and the corresponding simulation data (see Fig. 
IS]) is also consistent with the form of the distribution in- 
dicated by the experimental data reported by Chubb et 
al. [ll|. However, because of the large scatters in the 
experimental data, no quantitative comparison between 
our theoretical predictions and experimental observations 
could be made. 

Drawing an analogy to vehicular traffic [3l[ , we define 
the time headway to be the time gap between the depar- 
tures of the successive RNAPs from the termination site. 
Thus, according to this definition, the time-headway is 
the time gap between the completion of the synthesis 
of successive RNA molecules. A typical distribution of 
the time headways is plotted in fig[7J In the same figure 
we have also plotted the time headway distribution for a 
hypothetical scenario (which was considered in ref.[28j) 
where the gene always remains ON. The best fit to both 
these curves are gamma functions (with slightly different 
parameters). A comparison between these two curves 
shows that the switching ON and OFF of the genes leads 
to a weak broadeing of the distribution; the longer tail 
caused by the gap between the successive bursts is shown 
separately in the inset of fig [JJ 

We have been able to obtain an analytical estimate of 
the TH distribution only in a special limiting case ex- 
ploiting the formal analogy with the models of vehicular 



traffic [311 . Approximating the mechano-chemical cycle 
of each RNAP during the elongation stage by the path- 
way shown in figEtb), we can represent each RNAP (or, 
more precisely, each TEC) by a rigid rod, of length r, 
which can hop from one nucleotide to the next on the 
template DNA with an effective hopping probability q 
per time step. Thus, 



q ~ W12 dt, 



(9) 



where dt is the duration of each discretized time step. 
In this limit our original model reduces to the totally 
asymmetric simple exclusion process (TASEP) for hard 
rods of length r [32j , provided the gene always remains in 
the ON state. In the special case r = 1 the rods reduce 
to particles and the corresponding exact TH distribution 
for TASEP (with parallel updating) is given by [H, [H 



Vr = 



(III 



p-y 
qy 



(i - p) - y 
qy 



{i - (qy/p)} 1 - 1 

{l-{qy/{l-p))Y~ l 

qy i i 



p-y (l-p)-y 



v 



q\t-\) V 



t-2 



where 



1 



// = — ( 1 - y/1 - 4qp{l - p) 



(10) 



(11) 



and p is the number density of the particles. 

In order to test the range of validity of the expression 
(fTOj) in the context of RNAP traffic, we have carried out 
computer simulations of our model for r = 1 under pe- 
riodic boundary conditions keeping the gene always ON. 
In the first set of simulations, we have used the simpli- 
fied mechano-chemical cycle shown in figlHh) whereas 
in the second set we retained all the mechano-chemical 
transitions allowed in figj^a). The expression (fTUj) is 
in excellent agreement with the simulation data for the 
unbranched mechano-chemical cycle shown in fig[2](b). 
Moreover, even when the branched pathways of fig[D(a) 
exist, the simulation data are in reasonably good agree- 
ment with (fTU)) . 

In this letter we have reported a model that is ideally 
suited to study the effects of the steps of the mechano- 
chemical cycle of individual RNAPs and their steric in- 
teractions on the transcriptional bursts which are caused 
primarily by the switching of the gene between "ON" 
and "OFF" states. For the sake of simplicity, we have 
illustrated our approach with a minimal model of RNAP 
mechano-chemistry which assigns only two possible dis- 
tinct chemical states to an RNAP at any given loca- 
tion. For a more (biologically) realistic description, this 
mechano-chemistry of the RNAPs can be easily replaced 
by a more appropriate one without changing the overall 
framework of our model. 
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Suppose T f, s is the total time interval of observation 
and data collection in each single-cell experiment on tran- 
scriptional noise. In ref.[28], our theoretical analysis was 
restricted to a temporal regime such that (I) T Q b s < T on , 
where T on is the average duration for which the gene 
remains ON, and (II) T 6s <C T ce u, where T ce u is the 
mean life time of the cell before its division into the two 
daughter cells. Under these restrictions, our model of 
transcription [28j did not exhibit transcriptional bursts. 
In this letter we have shown that the same model can 
account for transcriptional bursts when we relax the con- 
straint (I). We show that the statistical properties of 



noisy transcription in our model in the temporal regime 
T on <C T f, s < T ce u are in excellent agreement with the 
corresponding experimental data. Moreover, drawing an 
analogy to vehicular traffic, we have reanalyzed the time 
series of the transcriptional events from a totally differ- 
ent perspective which does not require any sorting of the 
raw data into separate bursts. 

This work is supported (through DC) by a research 
grant from CSIR (India). DC also acknowledges hospital- 
ity of CCMT of IISc. Bangalore during the preparation 
of this manuscript. 
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